Efficient text-independent speaker verification with structural Gaussian mixture models and neural network

نویسندگان

  • Bing Xiang
  • Toby Berger
چکیده

We present an integrated system with structural Gaussian mixture models (SGMMs) and a neural network for purposes of achieving both computational efficiency and high accuracy in text-independent speaker verification. A structural background model (SBM) is constructed first by hierarchically clustering all Gaussian mixture components in a universal background model (UBM). In this way the acoustic space is partitioned into multiple regions in different levels of resolution. For each target speaker, a SGMM can be generated through multilevel maximum a posteriori (MAP) adaptation from the SBM. During test, only a small subset of Gaussian mixture components are scored for each feature vector in order to reduce the computational cost significantly. Furthermore, the scores obtained in different layers of the tree-structured models are combined via a neural network for final decision. Different configurations are compared in the experiments conducted on the telephony speech data used in the NIST speaker verification evaluation. The experimental results show that computational reduction by a factor of 17 can be achieved with 5% relative reduction in equal error rate (EER) compared with the baseline. The SGMM-SBM also shows some advantages over the recently proposed hash GMM, including higher speed and better verification performance. EDICS: 1-SPEA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-independent Speaker Verification

This paper describes one of the biometric systems text-independent speaker verification. It discusses the different stages of speaker verification in text-independent systems as well mentioning other systems for speaker verification. Each stage has its subparts, so those parts are discussed as well. The methods for the speaker-verification are displayed in the article. Feature extraction from t...

متن کامل

Speaker Recognition Using Gaussian Mixtures Models

Speech signal contains several levels of information. At first it contains information about the spoken message. At second level speech signal also gives information about the speaker identity, his emotional state and so on. The task of speaker recognition can be divided into two parts: speaker identification and speaker verification. Speaker identification is answering the question which one o...

متن کامل

Investigation of Frame Alignments for GMM-based Text-prompted Speaker Verification

The frame alignment acts as an important role in GMM-based speaker verification. In text-prompted speaker verification, it is common practice to use the transcriptions to align speech frames to phonetic units. In this paper, we compare the performance of alignments from hidden Markov model (HMM) and deep neural network (DNN), using the same training data and phonetic units. We incorporate a pho...

متن کامل

Probabilistic Neural Networks Combined with Gmms for Speaker Recognition over Telephone Channels

In this paper we study the applicability of Probabilistic Neural Networks (PNNs) as core classifiers to medium scale speaker recognition over fixed telephone networks. In particular, banking applications with up to 400 enrolled speakers and short training times are targeted. Two PNN-based open-set text-independent systems for Speaker Identification and Speaker Verification correspondingly are p...

متن کامل

Compute Efficient Training Method for Gaussian Mixture Model Based Speaker Verification

Speaker Verification is a memory and compute intensive process, giving rise to area and latency concerns in the way of its System-On-a-Chip implementation. The training schemes for computing the speaker models contribute significantly to the overall complexity in the implementation of the system. In this paper, we demonstrate that K-Means algorithm can be used to realize compute efficient train...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2003